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Abstract — Power law distribution is common in real-world 
networks including online social networks. Many studies on 
complex networks focus on the characteristics of vertices, which 
are always proved to follow the power law. However, few 
researches have been done on edges in directed networks. In 
this paper, edge balance ratio is firstly proposed to measure the 
balance property of edges in directed networks. Based on edge 
balance ratio, balance profile and positivity are put forward 
to describe the balance level of the whole network. Then the 
distribution of edge balance ratio is theoretically analyzed. In 
a directed network whose vertex in-degree follows the power 
law with scaling exponent 7, it is proved that the edge balance 
ratio follows a piecewise power law, with the scaling exponent of 
each section linearly dependents on 7. The theoretical analysis 
is verified by numerical simulations. Moreover, the theoretical 
analysis is confirmed by statistics of real-world online social 
networks, including Twitter network with 35 million users and 
Sina Weibo network with 110 million users. 

Index Terms — Complex network, online social network, di- 
rected graph, power law, edge balance ratio, balance profile, 
positivity, microblogging network. 



I. Introduction 
A. Complex Network and Power Law 

Large numbers of real-world systems can be described as 
complex networks fT|, (2), 0, |@J, which are represented 
as undirected or directed graphs. Individuals in the system 
are represented as vertices and interactions between them are 
represented as edges. The Internet Q, social networks j6|, sci- 
entists cooperation networks Q, protein interaction networks 
|8| are several examples. In general, complex networks contain 
large amounts of vertices. Interactions between the vertices are 
neither purely regular nor purely random. Despite the different 
appearances of networks, the nature of many networks has 
similarities. Researches on complex networks try to understand 
the structure and behavior of networks, which are important 
in various areas. 

Several characteristics are shared in most real-world net- 
works. The most well-known properties are small- world [9|, 
iflOl and scale-free ifTTI . lfl2l . lfl3l . In a small-world network, 
the average distance between vertices is small and the network 
has a high clustering coefficient, which represents the density 
of triangles in the network. In a scale-free network, the 
degree distribution follows a power law. In other words, the 
probability of vertices with degree k satisfies, 

P(k) ~ fc" 7 , 
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where 7 is the scaling exponent. Most real- world networks 
follow the scale-free property, with the value of 7 typically 
satisfying 2 < 7 < 3 ifPfl . It is indicated that in scale-free 
networks, large amounts of vertices have small degrees, while 
small amounts of vertices have degrees significantly larger 
than others. Power law distribution exists widely in nature, and 
attracts greatly attentions from researchers. Besides, some real- 
world networks have the characteristic of self-similarity fl5l . 
which means that part of the network is similar with the whole 
network. Some networks are structural lfl6l or hierarchical 
IfTTI . and can be clustered or divided. 

B. Vertices and Edges 

Vertices are important components in complex networks, 
as the representations of individuals. Abundant researches 
have been done on vertices in complex networks. The degree 
distribution of vertices is an important property of a network. 
The steady-state transition probability of vertices is utilized 
in PageRank to measure the importance of vertices lfl8l . 
Traditional community detection methods are designed based 
on partition of vertices lfl9l . Random walk is widely used 
in collecting information or recovering structures in large 
networks 1201 . Many efficient algorithms are proposed based 
on random walk to explore the network, including adaptive 
methods 1211 . However, most of random walk methods are 
based on the properties of vertices. Vertices always attract 
more attentions than edges in the research on complex net- 
works. 

Edges are also important components in networks, repre- 
senting the interactions between vertices. In computer and 
communication networks, edges represent the connections. In 
social networks, edges represent the relationships between 
users. Recently, researches on edges attract more and more 
attentions 1221 . Link prediction l23l . ll24l in complex networks 
is extensively studied. Random walk based on edges is intro- 
duced in 11251 . where walkers move between adjacent edges. In 
|26| . hierarchical clustering on edges is used in order to detect 
overlapping communities in networks, which is a promising 
strategy to analyze graphs. However, the researches on edges 
are not as many as those on vertices in complex networks. 
Research works on more complex cases such as directed edges 
in networks are still very little. 

Line graph is introduced in (27), which transforms the edges 
and vertices of the network. Line graph is the graph in which 
the vertices represent the edges of the original graph and 
vertices are connected if the corresponding edges are adjacent. 

Although edges of complex networks have attracted atten- 
tions from researchers and lots of work has been done, there 
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are still a lot to be revealed, especially for directed networks. 
In this work, edge balance ratio is proposed as a measure of the 
balance property of directed edges. The distribution of edge 
balance ratio is studied for power law networks. 

C. Assortativity and Hierarchy 

Assortativity or assortative mixing is firstly proposed in 
||28l to quantify the mixing property in networks. Vertices 
in assortative networks tend to connect to the vertices with 
similar property, which is typically the degree of the vertex. On 
the contrary, in disassortative networks, high degree vertices 
tend to connect to low degree vertices. Assortativity is gen- 
erally studied for undirected networks, however, approaches 
on directed networks are introduced in H291 and ll30l . In 
ll30l . four directed assortativity measures are used to quantify 
the correlations of combinations of in-degree and out-degree 
separately. For example, the in-out assortativity represents the 
correlation of the in-degree of the source vertex and the out- 
degree of the target vertex. Local assortativity is proposed in 
[31 1 to measure the assortative level of each individual vertex 
in the context of the overall network. Link assortativity is 
defined in [|32l for directed networks to analysis the assortative 
property of directed edges. 

Hierarchy is a critical nature of networks, especially directed 
networks. A technique for inferring hierarchical structure from 
network data is presented in [331 to explain the topological 
properties of networks. A maximum likelihood based method 
is proposed in [341 to infer social hierarchy from social net- 
works. Various definitions of hierarchy measures are proposed 
for directed networks to reveal the hierarchical property [35), 
ll36l . [37l . In [37], a rank is assigned to each vertex and the 
basic idea is to minimize the total "agony" which is led by 
edges pointing to lower ranked vertices from higher ranked 
ones. A measure of hierarchy is defined based on the minimal 
agony and reveals the hierarchical level of the network. 

D. Real-world Networks: Microblogging Networks 

Social network is a typical kind of complex network and 
mostly has the properties of small-world and scale-free. Mi- 
croblogging network is an important kind of social network, 
with Twitter ||3~8l and Sina Weibo [39l as typical representa- 
tives. Analysis on Twitter and Sina Weibo has been done in 

IE), ED, HE). 

Twitter is a widely used online microblogging service. 
People can follow others they interest in on Twitter to build 
their personal online social relations. Twitter users can publish 
text messages called tweets on their home pages and read 
tweets from whom they follow. One tweet is limited up to 140 
characters and it can be published with links and pictures. 

The social network built on Twitter is different from the 
one in real society because the relationships on Twitter can 
be unidirectional: you never need to get the person's approval 
when you decide to follow him. Besides, tweets always flow 
from the publisher to the users who follow the publisher and 
these users are called followers or fans. On Twitter, if one user 
has lots of followers, what he posts in tweets will be delivered 
to a wide range of users. 



In China, there are many online social networking services 
similar to Twitter. Sina Weibo is the most famous Chinese 
microblogging service. Weibo is the word of microblogging 
in Chinese. Sina Weibo has similar features with Twitter such 
as directed relationship and publishing text-message, called 
weibo, with limited characters. It is reported that Sina Weibo 
has 324 million registered users by the end of the first quarter 
of 2012. 

Besides, there are other microblogging services worldwide 
such as Tumblr and Plurk. Some microblogging services are 
locally used with special languages supported. Qaiku was 
launched in Finland to be a Finnish service, while ImaHima 
is popular in Japan. In China, FanFou is the first Chinese 
microblogging service website and now more appear like 
Tencent Weibo and Sohu Weibo. 

A microblogging network is a directed complex network. 
The users are represented as vertices and the relationships are 
represented as directed edges. If user A follows user B, there 
is an edge from vertex A to B in the directed graph. Each edge 
represents a following relationship in microblogging networks. 

E. Our Work 

One main contribution of this work is to propose balance 
measures for directed edges and the whole network, and to 
discover the power law property of the edge balance ratio in 
a power law network. Edge balance ratio is proposed as a 
measure of balance level of directed edges. Balance profile is 
defined to describe the global balance property of the whole 
network. The positivity of a network is defined to reveal 
the positive level of it. It is theoretically analyzed that for 
a network with power law in-degree distribution, the edge 
balance ratio follows a piecewise power law distribution, and 
the scaling exponents are determined by the scaling exponent 
of in-degree. The distribution curve of edge balance ratio 
is wizard-hat shaped. Edge balance ratio is the property of 
edges, while in-degree is the property of vertices. Our work 
establishes the link between them. Statistics of numerical 
simulations and real-world datasets confirm the theoretical 
analysis. 

The paper is organized as follows. In section II, some basic 
definitions including edge balance ratio and balance profile are 
proposed and the main contribution on the edge balance ratio 
is given, with theoretical analysis. In section III, conditions 
and definitions are proposed to establish the network model 
and simulations are performed to verify the theoretical results. 
Statistics results on real-world network datasets are in section 
IV. Some discussions are in section V and the paper is 
concluded in section VI. 

II. Main Result on Edge Balance Ratio 

In this section, edge balance ratio is proposed as a measure 
of the balance level of an edge. Based on edge balance ratio, 
balance profile and positivity are defined as a description 
of the global balance property of the directed network. The 
distribution of edge balance ratio for power law network is 
theoretically analyzed, which is the main result of this work. 
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A. Edge Balance Ratio 

This paper focuses on the basic properties of edges. The 
in-degree of a vertex is the amount of edges pointing to it. 
If there is an edge from vertex A to B, we call vertex A 
the out-vertex of the edge. Correspondingly, B is called the 
in-vertex. The in-vertex and out-vertex are not equivalent in a 
directed edge, which means edges are unbalanced in a directed 
network. In order to describe the balance level of edges, edge 
balance ratio is defined as one of the properties of directed 
edges. For a directed edge from vertex A to B, edge balance 
ratio R is defined as 



di(B) 
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where e4(B) and di(A) are the in-degrees of vertices B and A. 
The logarithmic edge balance ratio is defined as the logarithm 
of edge balance ratio, which is more convenient to describe 
the characteristics of the whole network. 

The edge balance ratio reflects the balance property of 
edges. In a common network, the in-degree of a vertex roughly 
reflects its importance. In most cases, a vertex with a larger 
in-degree is a more important vertex in the network. An edge 
with a large edge balance ratio implies that the in-vertex has a 
much larger in-degree than the out-vertex, which implies that 
the in-vertex is very likely to be more important than the out- 
vertex. A balanced edge is one whose in-vertex and out-vertex 
have similar importance. 

Unbalanced edges are common in real-world networks. 
Specially, in a directed network which follows power law, there 
are many edges pointing to the vertices with large in-degrees. 
Most of edges like these are unbalanced ones. 

In microblogging networks there are abundant unbalanced 
edges. For instance, famous stars attract large amounts of 
followers. As a result, most of these edges are extremely un- 
balanced and edges like these accounts for a large proportion 
in microblogging networks. Edge balance ratio implies the 
type of edges in a microblogging network. The edges with 
balance ratios far larger than one reflect the common following 
relationships, in which most uses are likely to follow users 
more famous than themselves. Edges with balance ratios close 
to one represent the relationships between friends or people 
in a similar social position. Edges with balance ratios far less 
than one may contain much more information of the network, 
which means a highly ranked user follows an ordinary user, 
reflecting some hidden real-world relationship or not apparent 
information between individuals. Therefore, research on edge 
balance ratio is of great significant on the microblogging 
platform. 

As an indicator related only to local information, edge 
balance ratio can be easily calculated in distributed systems 
such as adaptive networks 1431 , where lots of works have 
been done on the activities such as adaptive learning and the 
diffusion process P4l . B31 , ||46l , (47). In these systems, edge 
balance ratio can be an important factor for both edges and 
vertices. 



B. Balance Profile and Positivity 

Since in-degree approximately reflects the importance of a 
vertex in the network, an edge with balance ratio larger than 
one can be called a positive edge. An edge pointing from a 
high in-degree vertex to a low in-degree vertex is a negative 
edge, correspondingly. An edge with balance ratio close to one 
can be called a normal edge. 

For the whole network, balance profile is proposed as 
a global measure of the network, which is defined as the 
distribution of logarithmic edge balance ratio. The balance 
profile reveals the overall trend of the edges in the network. 
It reflects much more information, including the proportion of 
unbalanced edges of various levels. 

On the basis of balance profile, the positivity of a directed 
network can be defined as the expectation of logarithmic 
balance ratio with finite values. In a directed network with 
N vertices, the logarithmic balance ratio of an edge ranges 
within [— log (AT — l),log(iV — 1)]. The expectation can be 
normalized by log (N — 1), 

P= ,._,l ^ E{logiZ> 



log(iV-l) 
1 

~~\£'\\og{N -I) 



(A,B)e£' 



rfi(B) 
d l {A\ 



where £' is the set of edges with finite balance ratios. The 
range of positivity is [—1,1]. As the average of logarithmic 
balance ratios for edges, the positivity reveals the positive level 
of the whole network. Especially, for a directed network with 
all the edges bi-directed, it can be calculated that the positivity 
is zero. 

Positivity and assortativity reflect the properties of a directed 
network at different perspectives, though it seems that there 
are some similarities between positivity and in-in assortativity. 
Assortativity is a measure of the level that the vertices link 
to those with nearly the same degrees. However, positivity 
measures the level of vertices pointing to ones with higher 
degrees. An edge from a low in-degree vertex to a high in- 
degree vertex and an edge with the opposite direction both 
contribute a smaller assortativity to the network, while the 
former leads to a larger positivity and the latter leads to a 
smaller positivity. If the network has a large assortativity, the 
balance profile concentrates around the unitary edge balance 
ratio. Contrarily, the balance profile of a disassortative network 
is mainly far from the unitary balance ratio. The balance 
profile of a network with large positivity is mostly on the right 
side of the axis R equals to one, while that of a less positive 
network has more on the left side. Using both positivity and 
assortativity, the balance property of a network can be better 
described. 

As an example, the in-degree distributions and balance 
profiles of four directed networks with different typical in- 
degree distributions are illustrated in Fig.Q] The four networks 
have different shapes of balance profiles. The positivities of the 
networks are 0.6474, 0.0437, 0.0019 and 0.0068, respectively. 
The power law network has a large positivity, because the 
positive half of the balance profile is significantly higher 
than the negative half. The balance profiles of the last two 
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networks are almost symmetrical, which leads to rather small 
positivities. 

C. Basic Assumptions for the Network 

Power law distribution is the most common and the most 
important distribution in real-world networks. Directed net- 
works following power law are mainly discussed in this work. 
According to the properties of real-world directed networks, 
two basic assumptions are adopted in the network model. 

Assumption 1: The in-degrees of the vertices in the network 
follow power law distribution approximately, 

where is the amount of vertices with in-degree k, A is a 
scale factor and 7 is the scaling exponent of power law. 
Assumption 2: For any vertex Vo, 

P(di(V) = fc|V g JXVo)) « P(di(V) = fc), 

where P(dj(V) = fc) is the probability that vertex V has in- 
degree fc and .F(Vo) is the set of followers of vertex Vo. 

Assumption 2 means that there is no bias on the followers 
of vertices on average. The proportion of various types of 
followers of a vertex approximately equals that of the whole 
network. The assumptions are reasonable in a microblogging 
network. Real data of microblogging network verifies the 
correctness of the assumptions. 

The assumptions above are regarded as basic properties 
of the network. They are considered to be satisfied in the 
following of this paper. 

D. Theoretical Analysis on Edge Balance Ratio 

Firstly, the amount of edges with balance ratio R and in- 
vertex degree fc is calculated, as Lemma 1 . Please refer to the 
appendix for the proof of Lemma 1 . 

Lemma 1: If a power law directed network of N vertices 
satisfies Assumptions 1 and 2, then the total amount of 
edges with out-vertices of in-degree fc and balance ratio R 

A 2 

is ^_fcl-2 7i? ^ 

The main result on edge balance ratio is proposed as 
Theorem 1 , which provides the approximate calculation of the 
edge balance ratio distribution. Here, we consider the edges 
with finite balance ratios only. 

Theorem 1: If an A-vertex power law directed network 
satisfies Assumptions 1 and 2, for logarithmically divided 
counting intervals 

[••• ,a- {s+1) ,a- s ,--- ,a- 2 ,a- 1 ,l,a,a 2 ,--- ,a s ,a s+1 , ■ • •] 

of edge balance ratio, the distribution of edge balance ratio 
follows power law piecewise. In detail, the amount of edges 
with balance ratio R satisfies 



( 7 -2)(2 7 -3) 
-R\ 



-R L 

2 7 -2 

1 - a 2 ^ 
I ( 7 _2)(2 7 -3) 



P 7 " 1 , P<1; 

P^ i; 
P£ i; 
i? 2 ~ 7 , P>i. 



The scaling exponents of the four sections are 7 — 1, 7, 1 — 7 
and 2 — 7, respectively, where 7 is the scaling exponent of the 
in-degree distribution of the network. 

The outline of the proof is presented here while the details 
are included in the appendix. The cases of R > 1 and R < 1 
are calculated separately. 

Considering the edge balance ratios satisfying R > 1, 
the axis is logarithmically divided into counting intervals by 
[1, a, a 2 , • ■ ■ , a s , a s+1 , •••]. For each interval [a s , a s+1 ], the 
contribution of edge balance ratios R within the interval is the 
sum for all the combinations (fc, to) falling into this interval, 
where to is the in-degree of the out-vertex and fc is the in- 
degree of the in-vertex. It should be noted that each value of 
R is discrete. Not all values of edge balance ratio can have an 
edge exactly corresponding to it. The calculations for intervals 
with large s and those with small s are different. We divide 
the calculation into two parts, corresponding to the cases of R 
far larger than one and slightly larger than one, respectively. 
For R slightly larger than one, the sum vibrates strongly for 
various intervals. We focus on the peak values only and it 
follows the power law. 

Similarly, considering the edge balance ratio less than one, 



the counting interval is 



• , of 



For each edge balance ratio interval, we sum up all the 
combinations (fc,m) falling into it. The calculation is also 
divided into two parts. We focus on the peak values only for 
R slightly smaller than one. 

E. Remarks on the Theoretical Analysis 

Remark 1: The theoretical analysis reveals that the edge 
balance ratio obeys the power law piecewise if the distribution 
of in-degree obeys the power law distribution. The scaling 
exponent of the distribution of edge balance ratio in each 
section is strictly linearly determined by the scaling exponent 
of in-degree distribution. In-degree is the property of vertices 
while edge balance ratio is the property of edges. The relation- 
ship between the statistical properties of vertices and edges is 
established. 

Remark 2: Positive edges are much more than negative 
ones in a power law network. In other words, there are much 
more edges with balance ratios larger than one than edges 
with balance ratios smaller than one. The trend of the overall 
network is positive, which means that most of edges in the 
network are positive or normal edges. 

In Twitter and Sina Weibo, for a famous user, most edges 
related to it are from ordinary users, which leads to a large 
amount of positive edges. By contrast, for a user with few 
followers, most of his followers have similar ranks with him 
and there are few famous users follow him. Consequently, 
most edges pointing to him are normal edges. 

Remark 3: The distribution under logarithmical intervals in 
Theorem 1 is the balance profile of the network, which implies 
the characteristics of the network. The shape of the balance 
profile is determined by the scaling exponent 7. 

The balance profile for R < 1 always rises in the double 
logarithmic coordinate system. According to the theoretical 
analysis, the slope is 7 when R is slightly smaller than one 
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Fig. 1. The in-degree distributions and balance profiles of the four directed networks. The balance profiles have different shapes. 



while the slope is 7 — 1 when R is far smaller than one. It is 
known that the scaling exponent 7 of a real-world network is 
always larger than one in general. 

The shape of balance profile for R > 1 is determined by 
the sign of 7 — 2. The slopes are 1 — 7 and 2 — 7 for R > 1 
and R ^> 1, respectively. The balance profile always has a 
negative slope if R is not far from 1. For the case 7 larger 
than 2, the profile has a negative slope for R 3> 1. On the 
contrary, this segment of the curve will rise if 7 is less than 
2. In this case, edges with larger balance ratios account for a 
greater proportion. The critical value of 7 is 2. It affects the 
trend of the balance profile a lot. 

III. Numerical Simulations 

In this section, numerical simulations are done to verify 
the theoretical results. Real-world networks have a lot of ran- 
domness. In order to generate networks similar to real-world 
networks, two pairs of stochastic conditions are put forward, 
involving randomness into the network model. According to 
the conditions, four types of networks are constituted to simu- 
late real-world networks. The theoretical analysis is verified 
by simulations, including various stochastic networks with 
different power law scaling exponents. The balance profiles 
are shown to be wizard-hat shaped. 

A. Network Models: Stochastic Conditions and Stochastic 
Networks 

In a real-world network, the basic assumptions are satisfied 
statistically but not strictly. In order to get a better simulation 
of real-world networks, randomness is involved in the network 
model. The two pairs of conditions are as follows, for different 
interpretations of power law and the in-degree, respectively. 
The total amount of vertices in the network is denoted as N 
and A is a scale factor. 

1. The amount of the vertices with in-degree k is A ■ fc -7 ; 



TABLE I 
Four Types of Networks. 





Condition 1 


Condition 1' 


Condition 2 


Deterministic network 


Type I stochastic network 


Condition 2' 


Type II stochastic network 


Type III stochastic network 



1'. For any vertex, the event that its in-degree is k occurs 

in probability — • fc~ 7 . 

2. A vertex has an in-degree k means there are k vertices 
pointing to it; 

2'. A vertex has an in-degree k means that any other vertex 
points to it with probability k/N. 

The first pair of conditions describes definitions of the 
power law in different senses. Under the assumption of con- 
dition 1, the amount of vertices with in-degree k is strictly 
determined. However, condition 1' involves randomness and 
conforms more to real-world networks. The second pair of 
conditions presents randomness in the definition of in-degree. 
Similarly, condition 2' reflects greater uncertainty, for the 
sake of matching the real-world networks. The two pairs of 
conditions describe the network from different aspects and 
they are independent of each other. The conditions within each 
pair are equivalent in an average sense, while 1' and 2' involve 
randomness to simulate real-world networks. 

Utilizing the combinations of two pairs of stochastic con- 
ditions, four types of networks are defined as Table |U By the 
definition of deterministic network, the in-degree of vertices 
follows the power law strictly. The stochastic networks of type 
I and II involve the randomness in the power law distribution 
and the in-degree of vertices, respectively. The stochastic 
networks of type III involve the randomness of both two 
aspects. The four types of networks can be generated to 
simulate real- world networks. 
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Fig. 2. In-degree and edge balance ratio distributions of four different types of stochastic networks with 7 = 2.3. 




Fig. 3. The distributions of edge balance ratio for various scaling exponent 7 in stochastic network of type III. 



B. In-degree Distribution of Stochastic Networks 

Four networks with 400,000 vertices are generated sepa- 
rately, where the scaling exponent of power law is 2.3. The 
sub-figures at the top of Fig. [2] illustrates the in-degree distri- 
butions of four different types of networks defined previously. 

It is indicated that randomness is involved in stochastic 
networks of type I, II, and III, while the in-degree distribution 
of the deterministic network is stepwise. Moreover, the largest 
in-degrees of deterministic network and stochastic network of 
type II are relatively small, which means there is no vertex 
being followed by large numbers of vertices in these two 
networks. However, vertices with large in-degrees appear fre- 
quently in real-world networks. In this perspective, stochastic 
networks of type I and III are more like real-world networks 
beyond the others. 

C. Numerical and Theoretical Results of Different Stochastic 
Networks 

The numerical and theoretical results are illustrated in the 
sub-figures at the bottom of Fig. [2] The networks are generated 
with 400,000 vertices and scaling exponent 2.3. Four sub- 
figures are for different stochastic networks described previ- 
ously. The statistical results of balance profiles are shown as 
the points, while the four line segments indicate the theoretical 
results. Both theoretical and statistical results are wizard-hat 
shaped. 



The balance profiles of four networks are almost the same 
with that of real-world networks. For four different types of 
networks, the theoretical analysis matches the statistical result 
well. Four segments of power law compose the balance profile. 



D. Statistical and Theoretical Results of Different Scaling 
Exponents 

The scaling exponent 7 has a great influence on the property 
of networks. Power law networks with various scaling expo- 
nents 7 are generated to obtain various in-degree distributions. 
The stochastic network of type III is adopted in all of the 
simulations in this subsection. The statistical and theoretical 
results are illustrated in Fig. [3] The size of each network is 
400, 000. 

The statistical and theoretical balance profiles are indicated 
as the points and line segments in Fig. [3] With a smaller 7, 
one flatter wizard-hat is got and with a larger 7, a sharper 
hat is obtained. The statistical and theoretical results match 
well for various scaling exponent 7. Especially for the case 
where 7 is equal to 1.9, the theoretical analysis shows that the 
slope of this segment should be —0.1. The statistical balance 
profile rises when the balance ratio is larger, actually the same 
with the theoretical result. This verifies the correctness of the 
theoretical analysis. 
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Fig. 4. The in-degree distributions and balance profiles of Twitter and Sina 
Weibo. 

IV. Statistics of Real-world Networks 

A. Datasets of Microblogging: Twitter and Sina Weibo 

Data of Twitter and Sina Weibo is used as the real-world 
network. In-degree distributions and edge balance distributions 
are obtained on the two datasets. 

Both Twitter and Sina Weibo provide Application Pro- 
gramming Interfaces (API) for developers to collect data. 
The dataset of Twitter is downloaded from BTI . This dataset 
contains 35 million users and 1.4 billion relationships. The 
dataset of Sina Weibo is rare and we use APIs to crawl what 
we need. The profile of a Sina Weibo user includes user ID, 
screen name, gender, location and a brief description. Other 
data such as how many statuses are published and who follows 
the user and whom the user follows are also listed. The crawl 
began in July, 2011 and by now 110 million users and 6.96 
billion relationships are gathered. The users we have crawled 
cover more than 30% of all the users of Sina Weibo. 

B. In-degree Distributions and Balance Profiles of Twitter and 
Sina Weibo 

The sub-figures at the top of Fig. [4] illustrate the in-degree 
distributions of the datasets of Twitter and Sina Weibo. Both 
the distributions show the same characteristics of power law. 
The scaling exponent of Twitter is larger than that of Sina 
Weibo. The larger the scaling exponent is, the sparser the 
network is. The relationships between Sina Weibo users are 
much closer than that of Twitter. Due to Sina Weibo has a 
smaller scaling exponent, famous stars are more outstanding 
beyond ordinary users. The users of Sina Weibo are mostly 
from China while Twitter is used globally. That leads to that 
famous stars on Twitter are more dispersed than those on Sina 
Weibo. This matches the scaling exponents above. 



The balance profiles of Twitter and Sina Weibo are illus- 
trated in the sub-figures at the bottom of Fig. [4] Although 
the theoretical curves are similar with the statistics of real- 
world networks, they are not strictly the same. The in-degree 
distributions of real-world networks do not follow the power 
law strictly, and the following relationships are not strictly 
sampled uniformly. These reasons lead to the errors. Especially 
for the edge balance ratio of Twitter, there is a peak for large 
balance ratio, which can hardly be predicted. The reason for 
the peak is that users with lower ranks are more interested in 
following the users with very high ranks, which is not strictly 
the same with Assumption 2 in the following relationship. 

V. Discussions 

A. Counting Intervals 

Related to the discrete property of edge balance ratio R and 
the counting method, the cases of R close to one and far from 
one are different. For intervals near the unitary balance ratio, 
the counting intervals are short and only a few edge balance 
ratios fall into each of them. Therefore the edge balance ratios 
falling into different intervals are not even and the amount 
of edges vibrates a lot. For intervals far from the unitary 
balance ratio, large amounts of edges with various balance 
ratios fall into them. Thus statistical property affects a lot and 
the distribution looks smooth. 

There is no strict threshold in the abscissa for cases of 
R near one and far from one, which is determined by the 
interval parameter a. The larger the parameter a is, the larger 
the intervals are. More edges fall into each interval and the 
statistical property is more clearly shown. As a result, the 
section with the smooth statistical property is larger. On the 
contrary, the section with the vibrating property is larger when 
the interval parameter a is small. 

B. Error Analysis 

Although the theoretical slope in the double logarithmic 
coordinate system matches the simulation well, the intercept 
is not as precise, especially when R is much larger than one. 

The main error sources are in the following areas: 

1) Randomness is involved in the generated networks; 

2) The estimation for A in the power law representation of 
y equal Ax~ n is not precise enough; 

3) Some approximations are utilized in the theoretical anal- 
ysis, such as replacing the summation with integration. 

These lead to the error of estimated intercept in the double 
logarithmic coordinate system. However, the intercept is not 
as critical as the slope for the power law distribution. The 
scaling exponent is much more critical, which determines the 
property of the network. 

C. Further Investigations 

A stochastic network is generated with 400, 000 vertices 
and 7 equals 2.3. The vertex with the largest in-degree has 
30997 followers. If vertices are sorted by their in-degrees 
in descending order, the vertices at top 0.1%, 1% and 5% 
have 130, 22, and 7 followers, respectively. Considering these 
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Fig. 5. The edge balance ratio distributions for edges related to vertices with in-degrees 30997, 130, 22, 7, 4, 3, 2, and 1. 



vertices and vertices with in-degrees 4, 3, 2, and 1, the edges 
related to them are selected for further investigation. The 
balance ratios of these edges are shown in Fig. [5] Each sub- 
figure shows edges of all the vertices with the specified in- 
degree. The red points represent the edges start from the 
vertices, while the blue ones represent edges pointed to the 
vertices with the specified in-degrees. 

The distributions of in-edges are similar for vertices with 
various in-degrees and so are the distributions of out-edges. 
The reason for this is the Assumption 2, which is that the 
in-degree distribution of the followers of a vertex is approxi- 
mately the same as the distribution of all vertices in the whole 
network. 

As the increasing of the in-degrees of vertices, the red points 
are moving towards left and the blue ones are moving towards 
right. This is because that for the red out-edges, the edge 
balance ratios are smaller if the in-degree of the vertex is 
larger. Similarly, for the blue in-edges, the balance ratios are 
larger if the vertex has a larger in-degree. The wizard-hat- 
shape balance profile is the superposition of all the statistics 
like this. The contribution of vertices with in-degrees 1, 2, 
3 and 4 is critical, for the amount of these vertices is huge, 
although the followers of them are few. Correspondingly, the 
vertex with the most followers contributes a lot to the statistics 
of the balance ratio, even though there is only one such vertex 
in the network. 

Fig. [6] illustrates the average in-degrees of in-vertices and 
out-vertices of the edges in each counting interval. For edges 
with large balance ratios, the in-degrees of the out-vertices 
have no significant disparity, while the disparity between the 
in-vertices of these edges leads to the disparity of edge balance 
ratio. Correspondingly, for edges with small balance ratios, the 
disparity between the out-vertices contributes a lot. Therefore, 
an edge with a large balance ratio is mainly because the in- 
vertex has a large in-degree, while the reason for a very small 
edge balance ratio is that the out-vertex has a large in-degree. 
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Fig. 6. Edge balance ratio distribution and average in-degrees of in-vertices 
and out-vertices. 



VI. Conclusion 



In directed networks, edge balance ratio is an important 
measure of the balance property of edges, while balance profile 
is a description of the balance level of the network. If the in- 
degree distribution follows the power law, the distribution of 
edge balance ratio follows a piecewise power law, which is 
wizard-hat shaped. Numerical simulation results confirm the 
theoretical analysis. Real-world network datasets of Twitter 
and Sina Weibo are used to obtain the statistics of edge balance 
ratio, which is basically consistent with the theoretical results. 
Moreover, some related topics and detailed discussions are 
also included in this paper. 



9 



VII. Appendix 

A. The Proof of Lemma 1 

Proof: By Assumption 1, there are A ■ fc -7 vertices 
with in-degree fc in the network. According to Assumption 
2, among the followers of a vertex with in-degree fc, there are 
jy. ■ A ■ to -7 vertices having m followers. Consequently, the 
amount of edges with balance ratio R whose target vertices 
having in-degree k is 



where 



R 



for the combination (k,m). 

Therefore, the number of vertices with in-degree k and edge 
balance ratio R is 



A A 2 
— k^K 1 ■ Ak-t = —k^R? 



N 



N 



altogether. 



B. The Proof of Theorem 1 

Proof: The results of four sections are proved separately, 
corresponding to different cases. 

1) The Case R far larger than 1: For interval [a s ,a s+1 ] 
with a large s, the following inequality 

a* <-< a s+1 
m 

is established for a fixed k. Therefore, it satisfies 

k k 
— xt < m < — . 

Each pair of (m, fc) has a contribution of 

4!fc!- 27 ( A V = 4V- 7 m - 7 



N 



N 



edges within this interval. 

For k no less than a s+1 , the amount of edges is approxi- 
mately 



^ fcl ~ 7 E 



A 2 a 7 " 1 - 1 
A~ 7- 1 



5 (7-l)£,2-2 7 



where the sum is approximated by the integration. Summing 
over fc, the result is approximately 



A 2 a 7 - 1 - 1 
N 7—I 



i> £ fc 



2-27 



A 2 a 3 " 27 (a 7 " 1 -!) 



fc=r a «+i] 
s(2-7) 



(1) 



A ( 7 -l)(2 7 -3) 
For fc satisfying a 3 < k < a s+1 , the amount of edges in 



this interval is approximately 



iV 



fc- 7 £ 



m 



A^_l_ 



Sum over fc and it leads to 

V +1 J 



A 2 1 



- fc 1 - 7 -^' 7 - 1 ) ^ fc 2 - 27 



K +1 J 

E 



,4 2 1 /l-a 2 - 7 1-a 3 " 27 



A 7 - 1 \ 7-2 27-3 
The sum of O and © is 

A 2 1 - a 2 " 7 



s(2-7) 



(2) 



«(2-7) 



A ( 7 -2)(2 7 -3) 
It can be considered as the value at a s . In other words, the 
edge amount N(R) with respect to balance ratio R can be 
written as 

A 2 1 - rv 2 " 7 

1 ; TV ( 7 -2)(2 7 -3) 

which obeys the power law with scaling exponent 2 — 7. 

2) The Case R slightly larger than 1: If the interval 
[a s ,a s+1 ] has a small s, the sum vibrates strongly for var- 
ious intervals. We discuss the peak values only. Peak values 
are placed at integer edge balance ratios. According to the 
A 2 

expression — fc 1_27 i? 7 of each fc, the value at edge balance 



ratio R is 



N 



A 2 A 2 1 

^i? 7 ^) 1 - 27 w 



A 



i=l 



A 27 - 2 



(3) 



It demonstrates that the peak values obey the power law with 
scaling exponent 1 — 7. 

Specially, the value R = 1 corresponds to the case where 
the edge balance ratio is 1. By ([3]), the amount of edges in 

u A2 1 

this case is about . 

A 2 7 - 2 

3) 77ie Case i? /ar smaller than 1: For the interval 
[a~( s+1 ), a~ s ] with a larger s, m should satisfy the inequality 

a -(»+i) < A < a - 
m 

for a fixed fc. Therefore, 

fca s < m < fca s+1 . 

The contribution of the combinations (m, fc) within this inter- 
val is 

^!fcl-27fM = fl! fc l-7 m -7 

AT V«V N 

edges. The amount of edges for fc is approximately 



A 2 



E 



A 2 1 - a 1 " 7 
~N 7 - 1 



,,(l-7) fc 2-2 7 _ 



Then sum over fc and the result is 

A 2 1 - a 1 " 7 



A 7-I 



2-27 



A 2 



1 - a 1 " 7 



fc=i 



A ( 7 -l)(2 7 -3) 



a 



The sum above is the amount of edges at balance ratio a~ s . 
It can be written as 

A 2 1 - rv 1 " 7 

MR) « - - - i? 7 " 1 

[) A( 7 -2)(2 7 -3) ii: ■ 

In other words, the number of edges obeys power law with 
scaling exponent 7 — 1. 
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4) The Case R slightly smaller than 1; For the interval 
with a small s, the edge amount value vibrates 
and the peak values are taken into consideration. Similarly, 

the peak values are at — for each integer k. According to the 

k 

A 2 1 
expression — k 1-21 R? , the value of R = — is 



A? 
N 



-27 



A 2 



1 



N 2 7 -2 



-K 1 



The peak values obeys the power law and the scaling exponent 
is 7. 

When R equals 1, the above result is the number of edges 
with balance ratio equaling 1. It is the same with the result in 
previous section. ■ 

References 



M. E. J. Newman, "The structure and function of complex networks," 
SIAM Review, vol. 45, no. 2, pp. 167-256, 2003. 

S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D. U. Hwang, 
"Complex networks: Structure and dynamics," Physics Reports, vol. 424, 
no. 4-5, pp. 175-308, Feb. 2006. 

S. H. Strogatz, "Exploring complex networks," Nature, vol. 410, pp. 
268-276, 2001. 

M. E. J. Newman, D. J. Watts, and A. L. Barabasi, The Structure and 
Dynamics of Networks, Princeton University Press, 2006. 
M. Faloutsos, P. Faloutsos, and C. Faloutsos, "On power-law relation- 
ships of the internet topology," ACM SIGCOMM Computer Communi- 
cation Review, vol. 29, no. 4, pp.25 1-262, 1999. 

A. W. Wolfe, "Social network analysis: Methods and applications," 
American Ethnologist, vol. 24, no. 1, pp. 219-220, Feb. 1997. 
S. Rendner, "How popular is your paper? An empirical study of the 
citation distribution," The European Physical Journal B, vol. 4, pp. 131- 
134, 1998. 

H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and A. L. Barabasi, "The 
large-scale organization of metabolic networks," Nature, vol. 407, pp. 
651-654, 2000. 

D. J. Watts, S. H. Strogatz, "Collective dynamics of 'small-world' 
networks," Nature, vol. 393, pp. 440-442, 1998. 

D. J. Watts, Small Worlds: The Dynamics of Networks between Order 
and Randomness, Princeton University Press, Princeton, NJ, 1999. 

A. L. Barabasi, R. Albert, "Emergence of scaling in random networks," 
Science, vol. 286, pp. 509-512, 1999. 

A. L. Barabasi, R. Albert, and H. Jeong, "Scale-free characteristics of 
random networks: the topology of the world wide web", Physica A: 
Statistical Mechanics and its Applications, vol. 281, no. 1-4, pp. 69-77, 
2000. 

G. Caldarelli, Scale-free Networks: Complex Webs in Nature and Tech- 
nology, Oxford University Press, USA, 2007. 

R. Albert, A. L. Barabasi, "Statistical mechanics of complex networks," 
Review of Modern Physics, vol. 74, no. 1, pp. 47-97, 2002. 
C. Song, S. Havlin, H. A. Makse, "Self-similarity of complex networks," 
Nature, vol. 433, pp. 392-395, 2005. 

N. Kashtan, S. Itzkovitz, R. Milo, and U. Alon, "Topological general- 
izations of network motifs," Physical Review E, vol. 70, 031909, 2004. 

E. Ravasz, A. L. Somera, D. A. Mongru, D. A. Mongru, Z. N. Oltvai, and 
A. L. Barabasi, "Hierarchical organization of modularity in metabolic 
networks," Science, vol. 297, pp. 1551-1555, 2002. 

L. Page, S. Brin, R. Motwani, T. Winograd, "The PageRank citation 
ranking: Bringing order to the web," Technical Report, Stanford InfoLab, 
1998. 

S. Fortunato, "Community detection in graphs," Physics Reports, vol. 
486, no. 3-5, pp. 75-174, Feb. 2010. 

M. Rosvall and C. T. Bergstrom, "Maps of random walks on complex 

networks reveal community structure," Proceedings of the National 

Academy of Sciences, vol. 105, no. 4, pp. 1118-1123, 2008. 

L. Prignano, Y. Moreno, and A. D. Guilera, "Exploring complex 

networks by means of adaptive walkers," arXiv:1203.1439 2012. 

L. Getoor. C. P. Diehl, "Link mining: A survey," ACM SIGKDD 

Explorations Newsletter, vol. 7, no. 2, pp. 3-12, 2005. 



[23 

[24 

[25 
[26 

[27 
[28 
[29 
[30 

[31 

[32 
[33 

[34 

[35 

[36 
[37 



[38 
[39 
[40 



[41 

[42 
[43 

[44 
[45 

[46 
[47 



D. L. Nowell and J. Kleinberg, "The link-prediction problem for social 
networks," Journal of the American Society for Information Science and 
Technology, vol. 58, no. 7, pp. 1019-1031, May 2007. 

L. Lv, T. Zhou, "Link prediction in complex networks: A survey," 
Physica A: Statistical Mechanics and its Applications, vol. 390, no. 6, 
pp. 1150-1170, 2011. 

T. S. Evans and R. Lambiotte, "Line graphs, link partitions and over- 
lapping communities," Physical Review E, vol. 80, 016105, 2009. 
Y. Y. Ahn, J. P. Bagrow, and S. Lehmann, "Link communities reveal 
multiscale complexity in networks," Nature, vol. 466, pp. 761-764, Aug. 
2010. 

V. K. Balakrishnan, Schaum's Outline of Graph Theory, McGraw-Hill, 
New York, USA, 1997. 

M. E. J. Newman, "Assortative mixing in networks," Physical Review 
Letters, vol. 89, no. 20, 208701, 2002. 

M. E. J. Newman, "Mixing patterns in networks," Physical Review E, 
vol. 67, 026126, 2003. 

J. G. Foster, D. V. Foster, P. Grassberger, and M. Paczuski, "Edge 
direction and the structure of networks," PNAS, vol. 107, no. 24, pp. 
10815-10820, 2010. 

M. Piraveenan, M. Prokopenko, and A. Y. Zomaya, "Local assortative- 
ness in scale-free networks," Europhvsics Letters, vol. 84, no. 2, pp. 
28002, 2008. 

M. Piraveenan, K. S. K. Chung, and S. Uddin, "Assortativity of links in 
directed networks," http://elrond.informatik.tu-freiberg.de 
A. Clauset, C. Moore, and M. E. J. Newman. "Hierarchical structure 
and the prediction of missing links in networks," Nature, vol. 453, no. 
7191, pp. 98-101, 2008. 

A. S. Maiya, T. Y. Berger-Wolf , "Inferring the maximum likelihood 
hierarchy in social networks," IEEE International Conference on Com- 
putational Science and Engineering, vol. 4, pp. 245-250, Aug. 2009. 
A. Trusina, S. Maslov, P. Minnhagen, and K. Sneppen, "Hierarchy 
measures in complex networks," Physical Review Letters, vol. 92, no. 
17, 178702, 2004. 

E. Mones, L. Vicsek, T. Vicsek, "Hierarchy measure for complex 
networks," PLoS ONE, vol. 7, no. 3, e33799, 2012. 

M. Gupte, P. Shankar, J. Li, S. Muthukrishnan and L. Iftode, "Find- 
ing hierarchy in directed online social networks," Proceedings of the 
20th International World Wide Web (WWW) Conference, pp. 557-566, 
Hyderabad, India, 2011. 
http://www.twitter.com 
http://weibo.com 

A. Java, X. Song, T. Finin, and B. Tseng, "Why we twitter: Under- 
standing microblogging usage and communities," Proceedings of the 
9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and 
social network analysis, pp. 56-65, New York, NY, USA, 2007. 
H. Kwak, C. Lee, H. Park, and S. Moon, "What is Twitter, a social 
network or a news media?" Proceedings of the 19th International World 
Wide Web (WWW) Conference, pp. 26-30, Raleigh NC (USA), Apr. 
2010. 

Z. Chen, P. Liu, X. Wang, and Y. Gu, "Follow whom? Chinese users 
have different choice," arXiv, 2012. 

C. G. Lopes and A. H. Sayed, "Diffusion least-mean squares over 
adaptive networks: Formulation and performance analysis," IEEE Trans- 
actions on Signal Processing, vol. 56, no. 7, pp. 3122-3136, Jul. 2008. 
A. H. Sayed, "Diffusion adaptation over networks," arXiv: 1205.4220vl , 
2012. 

N. Takahashi and I. Yamada, "Link probability control for probabilistic 

diffusion least-mean squares over resource constrained networks," Proc. 

IEEEICASSP, pp. 3518-3521, Dallas, TX, Mar. 2010. 

S. Sardellitti, M. Giona, and S. Barbarossa, "Fast distributed average 

consensus algorithms based on advection-diffusion processes," IEEE 

Transactions on Signal Processing, vol. 58, no. 2, pp. 826-842, Feb. 

2010. 

S. Theodoridis, K. Slavakis, and I. Yamada, "Adaptive learning in a 
world of projections: A unifying framework for linear and nonlinear 
classification and regression tasks," IEEE Signal Processing Magazine, 
vol. 28, no. 1, pp. 97-123, Jan. 2011. 



